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ABSTRACT 


Information cascading, also known as epidemic influence diffusion, 
is ubiquitous in social networks and vital for interpreting diverse 
social phenomena. To determine the (asymptotic) number of seed 
nodes required to spread a piece of information to virtually all 
nodes in a social network graph, through information cascading, is 
a well-studied research problem. However, most prior works study 
this problem only on a static graph (ie., a snapshot of the social 
network). In this work, we study this problem in a more realistic 
setting: During the process of the information cascading, which 
could take a nontrivial amount of time (say months), the social 
network graph grows with the arrival of new nodes (newcomers) 
and their edges (social connections). This dynamic graph setting is 
a game changer since it brings about insights that cannot be learned 
in the static graph setting. For example, we find that newcomers 
command more influence power (ability to spread information) than 
existing nodes, since their social connections are “more interesting”. 
We have studied this problem under two different graph topology 
(and growth) models, namely Erdos-Renyi (ER) and preferential 
attachment (RA), and three cascading models, and have obtained 
analytical results for all six combinations. Through these analytical 
results, we have gained considerable understanding of the collabo- 
rative impact of both models on the information cascading process. 
For example, ER growth model witnesses the ‘reactivation’ phe- 
nomenon and latter seeds could take the place of early seeds in 
PA growth model. Our theoretic claims are further validated by 
experiments in both simulated and real social networks. 


1 INTRODUCTION 


Ideas, technologies, diseases and choices can spread among people 
via social interactions, which in some cases can lead to a cascading 
behavior. Cascade is an epidemic process that begins with a set of 
seed nodes (i.e., seeds) that, by influencing other nodes and having 
the influenced nodes further propagate this influence, eventually 
leads to a massive outbreak across the network. This cascading pro- 
cess has been used to explain and/or to induce various phenomena 
in online social networks (OSN), such as viral marketing, smoking 
cessation and maintenance[1], micro-finance loans[2], and politi- 
cal campaigning via OSN[3]. Due to its practical importance, this 
cascading process has been a heavily studied research topic in the 
past few years. 

An important research question along this topic is to determine, 
in terms of asymptotic order, the minimum number of seeds re- 
quired to cause such an outbreak whose size is comparable to the 
whole network size. This research question has many real-life appli- 
cations. For example, in OSN-based viral marketing, the marketer 
conceivably would like to influence, directly or indirectly, the vast 
majority of users in the OSN. While there have been considerable re- 
search studies on this research problem alone, most of them assume 
that the (OSN) graph is static during the information cascading 
process. This assumption is a bit detached from the reality, since 


the cascading process can take a nontrivial amount of time (say 
months), and in the meantime, the social network graph can grow 
considerably larger [4] with the arrival of new nodes (newcomers) 
and their edges (social connections). While readers might wonder 
whether analytical results under the static graph assumption can 
be suitably adapted (e.g., scaled by a growth factor) for dynamic 
graph scenarios, such adaptations are unlikely to work well, since 
they implicitly assume that existing users and newcomers have 
statistically identical influence powers, which we will show is not 
true. 

In this work, we study information cascading in evolving (ie., 
growing) social networks, focusing on the aforementioned research 
question of determining the minimum number of seeds required 
to induce an outbreak. Unlike in a static network where the seeds 
are determined upfront, in an evolving network seeds are usually 
selected on a continuous basis along with the network growth. For 
example, in viral marketing, often a new batch of seeds is resampled 
after each promotion period for the advertising to benefit from 
newcomers and their “refreshing” social connections, so that it can 
go beyond the old social circles of existing users to reach an ever 
larger population. 

In this work, we choose seeds (uniformly) randomly, as opposed 
to most of the prior arts that try to find out the best seeds strategi- 
cally, for the following reason. The random seed selection is often 
much more practically feasible than its strategic alternatives, such 
as that guided by the network structure information, since such 
information or knowledge can be extremely costly to acquire in 
field settings. Indeed, getting such information, such as the social 
connections between people in the network, is difficult, and influ- 
encing a specific (strategic) node/person can be very costly. For 
example, according to a recent estimation [5], conducting network 
surveys in 120 Indian villages would cost approximately $190, 000 
and take over eight months. Furthermore, as shown in [6], in terms 
of cascading effectiveness, random seeding performs just as well as 
network-structure-guided seeding, and hence is strongly preferred 
in real-life situations with budgetary considerations. 

We have found that the number of seeds necessary to cause an 
outbreak depends primarily on two factors: the information cas- 
cading model and the network topology. Both are indispensable: 
The former characterizes how a node is influenced by its neighbors 
and the latter to a large degree determines how far the information 
dissemination (i.e., influence spreading) process can go. In com- 
bination, they determine the scope and the rate of the diffusion 
process. In this work, we will consider all (six) combinations of 
three cascading models and two topology models. 

A. Cascading Model: Generalizing k-complex contagion Models 


The three cascading models we use in this work are the k-complex 
contagion model and two of its generalizations. In the k-complex 
contagion model, a node becomes infected when it is subject to 
the influence of at least k of its neighbors, where k is the influence 
threshold that is universal (i.e., for every node) across the network. 


This model, widely used, is known to fairly accurately capture the 
phenomenon of multiple confirmation in many real-life scenarios, 
such as the adoption of expensive medical innovation, the decision 
to participate in a migration and changes in social behaviors[9][10]. 
For example, a study on Facebook showed that having two or more 
real-life friends already on Facebook substantially increases the 
probability of joining Facebook[11], and a similar phenomenon was 
observed in a study on Twitter [12]. For another example, statis- 
tics from DBLP and LiveJournal(LJ) [13] suggest that the second 
affected neighbor can often contribute more to influence spreading 
than the first. 

A limitation of the k-complex contagion model is that the fixed 
threshold k does not capture the fact that individual nodes have 
different “risk tolerance levels”: In real-life, some people are risk- 
taking (i-e., more willing to try out new things) while some others 
are risk averse. Hence, we generalize this model by making the 
threshold k of each node a random variable with a distribution 
J. We use two different distributions in this work: uniform and 
Poisson. The former models the maximum possible diversity sta- 
tistically in individual risk-tolerance levels whereas the latter has 
been empirically shown in [15] to fairly accurate capture the actual 
diversity in practice. These two generalized models, together with 
the original k-complex contagion model, are the three cascading 
models used in our analysis in the sequel. 


B. Network Model: Evolving Topologies 

As mentioned earlier, for our analysis (of the minimum seed size), 
we need also to assume a network topology (growth) model that 
specifies how the network graph grows with the arrivals of newcom- 
ers. In this work, we consider two such models: Erdés-Renyi(ER) 
graph [16] and Preferential Attachment model(PA model) [17]. In 
an ER graph, a newcomer connects to each of the existing nodes 
with a fixed probability p. Under this growth rule, latecomers are 
expected to have more initial edges. In a PA network, on the con- 
trary, each node has the same number of initial edges. A new node 
connects to each existing node with a probability proportional to its 
degree. The resulting network has a power law degree distribution 
that is commonly seen in web graphs and academic networks. In 
the OSN context, ER model assumes indiscriminate selection of 
new acquaintances by each newcomer, whereas PA model assumes 
a preference toward those who have more acquaintances. 


C. Our Results 

We now specify the quantity we would like to analyze more 
precisely. For a target future graph size t (i.e., with t nodes at a 
future time), we want to know the minimum (asymptotic) number 
of seeds, uniformly randomly sampled from these t nodes over time 
(while the network is growing towards the target size t), that can 
influence the vast majority of the nodes in the network (i.e., cause 
an outbreak) before or when the network reaches the target size t. 
We perform this analysis under all aforementioned six combinations 
of network topology models and cascading models, in an effort to 
obtain a comprehensive understanding of the relation between this 
minimum seed size and the target network size t. 

A key observation we can make from these analyses is that, in 
nearly all six cases, newcomers (i.e., late birds) are generally more 
important in influence spreading. Beside that, other surprising 
results are summarized as follows: 


(1) In ER graphs, when the influence threshold k is a random vari- 
able, the required seed value decreases with the target network size 
t and approaches a constant value when t tends to infinity. We will 
show this counter-intuitive finding confirms the aforementioned 
insights that latecomers have more influence power. 

(2) In ER graphs, with a nonzero probability, the diffusion pro- 
cess may get stuck, in the sense no more nodes can be further 
influenced, before or when it grows to the target size t. Under such 
circumstances, the process can be “rejuvenated” by recruiting a 
sufficiently large number of additional nodes into the network (be- 
yond the target size t), even when none of these additional nodes is 
a seed. This phenomenon further demonstrates the disproportional 
influence power of the late birds. 

(3) In PA networks, the required minimum seed size grows with 
the target network size t, so in a sense the PA model is less favorable 
to the information diffusion process. In the worst case, this seed 
size can O(Vt). However, in this case, in terms of the percentage 
of target network size t, this seed size actually decreases when t 
is larger, which again confirms the information diffusion power of 
late birds. 

Our analyses of the minimum required seed size are further con- 
firmed by experiments under simulated and real networks. Due to 
space limitations, many detailed proofs and derivations involved 
are omitted and placed in our full version [31]. To our best knowl- 
edge, this is the first effort made towards understanding the infor- 
mation cascading in temporally growing networks. By disclosing 
asymptotically how seed size governs the cascading during network 
evolution, we believe our results are of fundamental importance for 
better understanding the dynamics of epidemics in real systems. 


2 RELATED LITERATURE 

In this section, we give a brief review of the related researches on 
information diffusion. Generally speaking, efforts have been made 
in two aspects: cascading and influence maximization. Since both 
are classic research fields with fruitful achievements, we simply list 
the prior works closest to ours. 

Cascading focuses on revealing the diffusion schema in certain 
surroundings. Diffusion in PA networks under IC or LT model 
has been extensively studied[18][19]. When the initial seeds are 
chosen uniformly at random, the diffusion is also called the boot- 
strap percolation[20]. Bootstrap percolation is examined in various 
networks, including ER graphs[21]. [22] studies the problem of 
finding a set whose infection would trigger an influence scale com- 
parable to the entire population in a static network. [15] and [23] 
analyse the cascade in time-evolving networks under k-complex 
contagion model and they are the most relevant to our work. Yet 
their conclusions are drawn from rather different perspectives. 

Influence maximization (IM), however, concentrates on algo- 
rithm design. Influence maximization aims to maximize the influ- 
enced members by determining an optimal seed set with a fixed size. 
First introduced by Kempe et al., IM has been studied in countless 
works from multiple angles. Most algorithms are designed for IM 
in static contexts. Recent researches have focused on proposing 
improvements and modifications under various constraints. For 
example, [7] and [8] address the algorithm scalability and propose 
solutions for billion-scale networks. [14] analyses IM under dis- 
count constraints. Meanwhile, there emerges a new class of IM 


where seeds are chosen periodically in dynamic networks, which 
is most similar to our settings. [24] studies IM in networks with 
changing edges and fixed nodes. [25] and [26] periodically select 
seeds according to influence estimation based on previous diffusion 
feedbacks via a bandit-based approach. 

To conclude, there have been vast studies both in cascading and 
IM, but to the best of our knowledge, no attempts have been made 
in determining the average number of seeds required to expand 
influence to the whole network in a dynamic context. 


3 PRELIMINARIES 
3.1 Cascading Model 


We embrace the k-complex contagion model to describe the influ- 
ence diffusion in undirected graphs. We would like to first introduce 
several notions of node which we will use in later sections, and 
then the definition of the generalized k-complex contagion model. 

Definition 3.1. (“Blank’ Node, ‘Influenced’ Node, ‘Influenced’ De- 
gree, Seed): At a given time stamp, ‘Blank’ nodes refer to nodes 
who have not yet been influenced and ‘influenced’ nodes are nodes 
that have already been influenced. A *blank’ node can be turned 
into an ‘influenced’ node either during initialization or later by 
its neighbors. And we call the degree that an ‘influenced’ node 
has as an ‘influenced’ degree. For example, if there are ten edges 
connect with an ‘influenced’ node, than we say the network has ten 
‘influenced’ degrees. Seeds are nodes who are influenced during 
initialization. They are the source of the information cascading. 

Definition 3.2. (K-Complex Contagion Model): Given an undi- 
rected graph G generated by an evolving process, a k-complex 
contagion CC(G, po, Ru, Z) is a contagion initiated by a po propor- 
tion of nodes (seed proportion being po during initialization) that 
spreads across network G. The information diffusion proceeds in 
rounds. R,, is a stochastic variable that follows distribution Z. It 
represents the node ‘influence threshold’ k. The ‘influence thresh- 
old’ k of each node is generated in round 0 and is fixed during 
information diffusion. By the end of each round, every node with 
no less than k influenced neighbors is influenced. 

K-complex contagion model is a general model that captures 
the influence spreading dynamics in a network. Let us take the 
popularization of smart phones as an example to illustrate the 
process. When smart phone was first introduced, people who were 
using traditional phones were ‘blank’ nodes and those who adopted 
smart phones in the first place were seeds. Assume a person who 
was still using traditional phone. He was a ‘blank’ node. One day, 
he found one of his friends using smart phone and heard positive 
remarks on the smart phone from this friend. Several weeks later, 
he found a great number of his friends starting to use smart phones 
and he heard a lot more good positive evaluations. He was finally 
influenced by his friends and decided to buy a smart phone. In 
doing so, he became an ‘influenced’ node. Further, if he found 
smart phone convenient and recommended it to others, he would 
be contributing to the popularization of the smart phone, aka. the 
information diffusion. In this example, the influence threshold k is 
the number of friends who use smart phones the moment a person 
decides to switch to a smart phone. 

In this paper, we go beyond the tradition k-complex contagion 
model which simply assumes that each node has the same ‘influence 
threshold’ k. The K-complex contagion model we adopt conforms 


more to the reality in that k varies from individual to individual: 
some people tend to go with stream, while others are likely to stick 
to their own idea. We consider two different distributions Y of the 
‘influence threshold’: Uniform Distribution and Poisson Distribution. 
The term R, ~ J specifies that ‘influence threshold’ k follows 
distribution . 


3.2 Graph Model 


An evolving general network is a network that has an identical 
evolution dynamics of a real network. The following theorem shows 
the difficulty of our problem in an evolving network with general 
topology. 

THEOREM 3.3. Finding the minimum number of seeds to cascade 
the influence to the whole evolving general network is an NP-hard 
problem. 

The main idea of the proof is to reduce this problem to the 
bond percolation one, which can be further converted to the Set 
Cover Problem. The specific proofs of Theorem 3.3 can be found 
in full version [31]. Given the NP-hardness of the problem, we 
study instead the information diffusion under two specific evolving 
network models, ie., the evolving Erdés-Renyi (ER) graph and the 
evolving Preferential Attachment (PA) network. 

An ER network, or alternatively denoted as G(n, p) graph, has 
been widely served in a large body of literature as the basic model 
due to its enjoyable mathematical tractability. In a G(n, p) model, 
there are n nodes in total and an edge exists between any node 
pair with probability p independently. p can also be interpreted as 
graph density, as an increase in p yields a denser graph. Based on 
the static ER model, we define below its evolving version. 

Definition 3.4. (Evolving Erdés-Renyi Model): In an Evolving 
Erdés-Rényi model (evolving ER Model, network grows at a uniform 
rate, with one new node being added to the network at each time 
slot. Each new node is supposed to emit an edge to every existing 
node with the same probability p. In other words, each new node 
is expected to emit a total of tp edges upon arrival, with t being 
the network size the moment the new node is added. The evolving 
ER Model incorporates the growth nature of social networks. As p 
stays fixed and the network continues to expand, latter nodes will 
have more initial edges than the earlier ones. 

While ER graphs can also reflect the equal chance of connection 
among different users, in some other real situations new comers 
have tendency to connect to those of higher popularity. Take Twit- 
ter for instance, users are more inclined to follow the account of an 
influential figure like Taylor Swift or a big organization like China 
Daily. The same also holds in academic networks, where papers 
that have higher citations have a higher chance of being further 
cited. This common biased connection choice can be well depicted 
by the celebrated Preferential Attachment Model. In the evolving 
networks of interests, we formally define the Evolving Preferential 
Attachment Model as follows. 

Definition 3.5. (Evolving Preferential Attachment Model): In the 
Evolving Preferential Attachment Model (the evolving PA model) 
PAm,n(V,£), new nodes come into the network in a sequence. One 
new node comes at every time slot and emits m edges to the the 
existing nodes. The probability that an existing node uv connects to 
the latest node is proportional to its degree deg(v) (the preferential 
attachment rule). Note that >),,-y deg(v) = 2mn. Therefore, these 


nodes are supposed to gain more degrees than nodes that come 
later. The generated network has a power law degree distribution. 

As noted earlier, when considering the diffusion process in two 
above network models, we adopt random seeding that goes in paral- 
lel with network evolution. Meanwhile, the ‘influence threshold’ k 
of each node is determined upon arrival and stays fixed. Thus, for a 
target future graph size t (i.e., with t nodes at a future time), we aim 
to derive the minimum (asymptotic) number of seeds, uniformly 
randomly sampled from these t nodes over time (while the network 
is growing towards the target size t), that can influence the vast 
majority of the nodes in the network before or when the network 
reaches the target size t !. 


4 MAIN RESULTS 


We present in this section the main results of this paper with the 
above definitions and statements along with their intuitive inter- 
pretations. The proofs are relegated to Sections 5. We are going to 
unfold the estimated order of seed quantity required to achieve an 
influence scale comparable to or bigger than the whole network 
under six settings. Each setting is a combination of a network topol- 
ogy (an ER graph or a PA graph) and a diffusion model (one of the 
three k-complex contagion models). 


4.1 Larger Network Size But Fewer Seeds 


Before unfolding the main results in ER networks, we would like 
to recall that the ‘influence threshold’ R,, always equals ko in tra- 
ditional k-complex contagion model, but follows the distribution 
QF in generalized k-complex contagion model. When J stands for 
Uniform Distribution, Ry € [[1,n-1]]. When ZF refers to Poisson Dis- 
tribution, we normalize a eA x to 1. With these preliminaries, 
we are able to draw conclusions when n — oo. 


THEOREM 4.1. Let G(n,p) and CC(G, po, Ru, J) represent respec- 
tively the evolving ER Model and the k-complex contagion model. In 
order to achieve an influence scope that has the same order of the 
whole network till time slot n, the probability that each new node 
becomes a seed should satisfy 
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under traditional k-complex contagion model 


(2) O((1-p)) 


1-O(p) under generalized k-complex contagion model where 


k follows the uniform distribution. 
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under generalized k-complex contagion model 


where k follows the Poisson distribution. 

Theorem 4.1(2) shows that under generalized k-complex con- 
tagion model where Y obeys to a uniform distribution, the seed 
quantity increases along with the network expansion. Moreover, it 
is comparable to n. Since R, € [[1,n — 1]], the uniform distribution 
augments average “influence threshold” during network evolution, 
which intuitively impedes the influence from spreading. 

In contrast, Theorem 4.1(1) and (3) imply that when k is fixed 
or follows a Poisson distribution, fewer seeds are needed for a 


'In the rest of the paper, we will use ¢ and n interchangeably. 


network-wide influence scale as network expands. In fact, an or- 


der of O (2 In ( fu) (9 (2 In (¥22))) seeds are sufficient to 
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trigger an influence range comparable to the entire network. This 
seemingly counter-intuitive phenomenon suggests strong diffusion 
power of late birds. Recall that in the evolving ER model, late birds 
are supposed to emit more edges to the network. Higher degrees 
of late birds contribute to their strong cascading capability, which, 
in some cases, can lead to a ‘reactivation’ phenomenon. 

Definition 4.2. (‘Reactivation’ Phenomenon): The ‘reactivation’ 
phenomenon is a phenomenon that sometimes occur in ER net- 
works under the k-complex contagion model. For simplicity, we 
adopt the 2-complex contagion model and illustrate the situation 
in figure 1. The diffusion is at an impasse at some time slot (the 
leftmost picture). In the next time slot, a new node connects to the 
only two infected nodes and two other uninfected nodes. Thanks 
to the newcomer, the diffusion is ‘reactivated’ and eventually the 
whole network is influenced. 
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Figure 1: ‘Reactivation’ phenomenon. Late birds could recur 
the influence diffusion. 


In order to consider ‘reactivation’ in ER networks, we define the 
Advanced Evolving ER model G(n, n’, p) based on ‘reactivation’. 

Definition 4.3. (Advanced Evolving ER Model): In G(n, n’, p), we 
divide network evolution into two phases, with network size being 
nand n’ by the end of each phase. During the first stage, we will 
randomly pick some nodes to be seeds. The cascading reaches an 
impasse by the end of the first phase. In the second stage, we add 
another n’ — n ‘blank’ nodes into the network, hoping to ‘reactivate’ 
the diffusion process. 

In the following sections, we call the first phase as the ‘impasse’ 
part and the second phase as the ‘reactivation’ part. 

The advanced evolving ER model G(n, n’, p) states that when the 
diffusion process stagnates by the end of ‘impasse’ part, the arrival 
of ‘blank’ newcomers in ‘reactivation’ part can help ‘revive’ the 
influence spreading under certain conditions. We hereafter would 
like to precise the conditions under each diffusion model in the 
following lemma. 

LEMMA 4.4. Let po be the proportion of influenced nodes by the 
end of the first phase of network evolution. Nodes added during the 
second phase can reactivate and boost the influence scope comparable 
to network size till time ‘n’ if the following conditions are met: 

(1) Under traditional k-complex contagion model where k is fixed, 
the network-wide influence diffusion happens with probability 1 — 
Q(4) ifnpo = ko and n’ satisfies 

n’ >oO ko + 2Inn— npop +n (1) 
(n — 1)(n — 2)p2p? 

(2) Under generalized k-complex contagion model where k follows 
the Uniform Distribution, when Ry, € [[1,n — 1]], the network-wide 
diffusion happens with probability 1 — @(4) ifn’ satisfies 


n 
n’ > O| — (2) 

i) 
(3) Under generalized k-complex contagion model where k follows 
Poisson distribution and Ry € |[1,n — 1], the network-wide diffusion 


occurs with probability 1 — @(4) ifn’ satisfies 
2Inn— npop + Aet 


| a 


(n—1)+ (3) 
where ty is a constant. 

Lemma 4.4 elaborates the claim that late birds can reactivate 
the influence diffusion in ER networks when n’ satisfies a certain 
inequality. A closer examination of the inequality constraints leads 
to the conclusion that the ‘reactivation’ is most unlikely to occur 
when k follows the uniform distribution. Furthermore, the ‘reactiva- 
tion’ is more difficult to take place under the generalized k-complex 
contagion model than under traditional k-complex contagion model. 
Mathematically, the inequality (2) is the most strict condition mainly 
because the uniform distribution has highest expected number of k, 
which impede information diffusion mostly. And the inequality (3) 
is harder than inequality (1) due to the fact that nodes with high k 
under generalized k-complex contagion models impede greatly in 
the cascading. The ‘reactivation’ process also explains the decline 
of the required proportion of seeds in some cases in Theorem 4.1 
during network expansion. 


4.2 Late Comers but Large Diffusion Power 


Next, we would like to show our results in PA networks PAm, n(V, E). 


Similarly, in generalized k-complex contagion models, we will let 
Ry € [1, m] when the distribution stands for Uniform Distribu- 
tion. Otherwise, we will normalize the expression ))7” , eA a to 1 
when J represents Poisson Distribution. Then we are able to draw 
conclusions even when n — oo. 


THEOREM 4.5. Denote the Evolving PA Model to be PAm,n(V, E) 
and the k-complex contagion model to be CC(G, po, Ru, J). In order 
to achieve an influence scope that covers the whole network, the 
probability of nodes to become seeds upon arrival during the first 
phase should satisfy: 

(1)9 eta under traditional k-complex contagion model 
where k is fixed. 

(2) (ca) under generalized k-complex contagion model 
(Vm-1-1) 

mv2n ~ 
(3)9 (sah under generalized k-complex contagion model 


where k follows the uniform distribution and H = 


where k follows Poisson distribution. 


Theorem 4.5 shows that as PA networks grow larger, the neces- 
sary seed quantity also increases. This suggests that the PA based 
network structure somewhat prevents influence from spreading. 
The reason may be that the ‘reactivation’ phenomenon seldom oc- 
curs. In PA networks, the ‘reactivation’ occurs with a probability of 
p>, where S is a constant in a specific k-complex contagion model. 
In the evolving PA model, as [27] shows, the highest probability 
that the new node connects to an existing node is ct with n being 


the network size. Therefore, ‘reactivation’ occurs with a probabil- 


§ 
ity smaller than (4) . As n — ©, ‘reactivation’ is increasingly 


unlikely to occur. Nonetheless, we can observe that the seed pro- 
portion decreases as PA networks evolve, which again shows the 
diffusion capability of late birds. 

Moreover, as we could see from the Theorem 4.5, under the 
traditional k-complex contagion model, the order of seed quantity 
is @(——_+—\_ ). When a network doubles its size, the order of 

vn-In(n-1) 


seed quantity becomes (Faas about V2 times the former 
n—Intén— 


quantity. Because the seeds distribute uniformly in the network, 
this relation suggests that some late birds (about V2 + 1) can ex- 
ert approximately the same amount of influence on cascading as 
one early bird. Considering the fact that late coming nodes in PA 
networks have smaller degree in the network, the scaling relation 
is a meaningful discovery and can have practical interests in real- 
ity. Take advertising campaign for example, asking a celebrity for 
endorsement costs much more than randomly asking a dozen of 
people for online product promotion. 


5 PROOF OF THE RESULTS 


In this section, we prove that with a certain number of randomly- 
assigned seeds coming, influence can eventually be spread to the 
scale comparable to the whole evolving ER (or PA) network size 
under generalized k-complex contagion models when the network 
grows big enough. 


5.1 Overview of The Proof 


We would like to provide a proof overview before diving into tech- 
nical details. Our proof is exclusively for undirected contagions, 
where influences can diffuse in both directions on an edge. As seeds 
come randomly into the network along with ‘blank’ nodes, we can 
assume seeds come at a probability of 0. And our goal is to find a 
suitable @ to spread the influence to the scale comparable to whole 
network size with high probability. 

The diffusion process can be decomposed into two parts: ‘initial’ 
diffusion and ‘inner’ diffusion. The ‘initial’ diffusion shows the 
influence that newly coming nodes bring to the network. During 
‘initial’ diffusion process, the newly coming node is a seed or it get 
influenced by other nodes in the network, it may successfully influ- 
ence its neighbor nodes who already exit in the network. During 
‘inner’ diffusion, influence spread between existing nodes. Newly 
influenced nodes in the network spread influence to their neighbors 
in an iterative manner. In the following we present the analysis of 
“initial diffusion” and “inner diffusion” in mathematical ways. 

‘Initial’ Diffusion: Let G be the generated graph according to 
the evolving ER or PA model. Assume that node t is the t;, node 
that arrives in G. Let V; be the set of first t nodes in G and X; be 
the set of ‘influenced’ nodes in V;. If t has an ‘influence threshold’ 
k, t gets infected if and only if at least k of its neighbors belong to 
X;~-1. Let I; be the state indicator of node t, with I; = 1 signifying 
the infected state and I; = 0 otherwise. We use 1(NZ) = 1 to 
symbolize the event that node u connects with node t. Otherwise 
not. Therefore, 1(N/) - I; = 1 means the node u has an influenced 
number t. Define Y; as the current proportion of the ‘influenced’ 
degrees (recall Definition 3.1) in the whole network. Note the 
degree of node t as deg(t). Then, >),<y, deg(v)Y;, the total number 
of the ‘influenced’ degrees in the whole network after the arrival 
of t, is determined by the following four ingredients: 


e There are currently ),cy,_, deg(v)Yr-1 ‘influenced’ de- 
grees. 

e The edges that t emit into the set X+-1 will contribute to 
the infected degrees. 

e Let psum be the probability that node t gets infected. psum 
is a sum of two probabilities: the probability that t is a seed 
and the probability that t is immediately influenced upon 
arrival. If t is infected, then all the edges it emits contribute 
to the ‘influenced’ degrees. 

e Let H(u) be the number of infected neighbors of an exist- 
ing node u whose ‘influence threshold’ is k. Suppose now 
H(u) = k — 1. Thus, u will become influenced if ¢ is influ- 
enced and t connects to it. We multiply the probability of 
this event as M(u) = Pr[I,|Ni, H(u) = k-1, 1; = 1,1, = 0]. 
Multiplying M(u) with the degree of u helps us get the ex- 
pected number of ‘influenced’ degrees node u could bring 
to the network under the above phenomenon. Then we 
use y to denote the whole ‘influenced’ degrees the above 
event could bring to the network. 


Adding up these ingredients, we get the following recurrence: 


y deg(v) Yt = >» deg(v) Yt-1 + deg(t)Y¥+-1 + deg(t)psum +Y, 


vEV; VEVi-1 


(4) 
Psum =9+(1- 6)Pr[Iy = 1], (5) 
t-1 
y= (PrlIulNi, H(w) =k-1,l; = 1,1, = 0] - deg(u)) . (6) 


u=1 

‘Inner’ Diffusion We assume that the influence diffusion is 
instantaneous. Therefore, we do not have to consider actual start 
point of the ‘inner’ diffusion and can directly depict the final state 
of ‘inner’ diffusion at last time slot. In evolving ER networks, as all 
nodes have the same probability p to connect to others, the diffusion 
pattern of each node is identical. Therefore, we could analyze the 
‘inner’ diffusion via a holistic approach. In evolving PA networks, 
however, we always observe a cumulated ‘inner’ diffusion. As the 
influence diffuses without delay, we can release the ‘inner’ diffusion 
at the last time slot. 

According to the k-complex contagion model, a ‘blank’ node will 
not become an ‘influenced’ node unless it is initially influenced in 
the ‘initial’ diffusion or surrounded by no less than k ‘influenced’ 
neighbors, with k its ‘influence threshold’. Therefore, let po be the 
proportion of influenced nodes by the end of the ‘initial’ diffusion, 
and i, u be the node in network, we have: 


Pr[J, = 1] = po+(1—po) » Prp,~g[Ru = k] pty. 1 (Ni) i > 4] 
k i=1 


(7) 

Then we can get the probability that the scale comparable to 
the whole network size is affected by multiplying the infection 
probability of each node u. 

Therefore, by combining the ‘initial’ and ‘inner’ diffusion process 
together, we can deduce the suitable 0 that ensures a network-wide 
infection. 

To make our proof easier to demonstrate, we first show the power 
of the ‘inner’ diffusion and find the suitable proportion of influenced 
nodes to spread the influence to the whole network. Then, with 
recurrence, we consider the ‘initial’ influence and inversely derive 
the number of seeds. 


5.2 Cascading under Traditional k-complex 
contagion Model 


In this part, we will first prove our results in evolving ER networks 
and then in evolving PA networks. Due to space limitations, we 
only unfold our most important technical flow in the proof. Many 
intermediate results are summarized in the form of lemmas and 
corollaries, whose detailed derivations can be deferred to the full 
version [31]. 


5.2.1 Proof of Lemma 4.4 (1). This lemma shows the power of 
nodes that arrive in the ‘reactivation’ part (recall the Definition 4.3) 
in G(n, n’, p) and reveals the ‘inner’ influence among nodes of the 
‘impasse’ part (i.e. the ‘reactivation’ phenomenon). We assume i 
as the nodes in the ‘reactivation’ part and u as the ‘blank’ node in 
the ‘impasse’ part. As Fig.1 shows, in order to contribute to the 
influence diffusion, the node i in ‘reactivation’ part needs first to 
be influenced (i.e. J; = 1) and then to connect to the ‘blank’ nodes 
u (ie. N} = 1) in the ‘impasse’ part. In order to be influenced, a 
node i from the ‘reactivation’ part needs to emit k edges into the 
‘influenced set’ Xy (recall the definition in the ‘initial’ diffusion). 
And the probability that i emits an edge into Xy is ppo. Therefore, 
we use bayes formula to denote the probability that a node in 
‘reactivation’ part facilitates influence spreading in the following 
way: ; 

_= L _ 
Pr{lilNE, Ty] 2 Pr[ij = L Nu tu = 0] 
Pr[Ni, Iy = 0] 
nm CH (pop) (1 = pop)" (1 = (1 = @"*) 


1-(1-(1 - pop)q)” 








(8) 


Here the ‘q’ in the above expression refers to the inverse of the 
‘uninfected’ population in the ‘impasse’ part. 

Suppose the fixed ‘influence threshold’ R,, is k. In an evolving 
ER network, every edge emerges with probability p. Recall that 
the proportion of influenced nodes is po by the end of the ‘impasse’ 
part. Consider a node u in in G(n,n’, p). The influence that u is 
submitted to consists of two parts: the influence originating from 
nodes in the ‘impasse’ part and that from the ‘reactivation’ part. 
Nodes from the ‘impasse’ part impose a total influence of nppo to u. 
Nodes from the ‘reactivation’ part give u an influence that amounts 
to eee Pr{ Ji |Ni, I,,]. According to Eqn. (7), the probability that 
u gets influenced in an evolving ER network is: 


n’ 
Pr[Jy = 1] = po + (1 — po) x5) 1(Ni) +1; | 
i=1 (9) 
re . 
= po +(1-po)Pr[popn+ >) PrlfilNq. tu] > kl 
i=n+1 
Then we can further simplify Pr[J;|N/,I,] with the following 
corollary. 


Coro tary 5.1. When n’ — oo and p << 1, we have: 
Pr{Ii|Ny. Iu] = (n - 1)(n - 2)(pop)” (10) 


Injecting the corollary into the expression of Pr[I, = 1], we 
have: 


Pr[fu = 1] = po + (1~ po) (1 -Prlpopn + (n" ~ n)(n- 1)(n - 2)p5p? < kl) 





Hence, the probability that every node in the ‘impasse’ part is 
influenced is: 


Pr [First n number of nodes get influenced] 


, 


Sit (Ni) 4 <k 


i=1 


=1-n(1—-po)Pr oH) 








Assuming t; as a constant, by chernoff bound we have: 


Pr[First n number of nodes get influenced] 
n(1 — po) exp(tik) 
exp (4 . (popn +(n’ —n)(n-1)(n—- 2)p3p*)) 





-1- 


Once npop + (n’ — n)(n - 1)(n - 2)pep" > k+2Inn is satisfied, 
Pr[First n number of nodes get influenced] will equal to 1 — @(4), 
which goes to 1 when n —> ov. 

Furthermore, Lemma 4.4 (1) shows that once n’ = n (i.e. when 
all the nodes belong to the first part), the ‘inner’ influence booms 
up when npop = O(21nn). With this result, we proceed to give out 
the proof of Theorem 4.1 (1). 


5.2.2 Proof of Theorem 4.1(1). The key to the proof is to explicit 
the ‘influence’ potential (i.e. psym and y) of the newcomer node. 
As new nodes come into the network iteratively, influence spreads 
recursively. Therefore, we try to establish recursive equations and 
find the expressions of psym and y. 

Psum is affected by the following two probabilities: 


e The probability that the newly coming node is a seed. 
e The probability that the newly coming node connects with 
a sufficient number of ‘influenced’ nodes. 


As such, we can rewrite Eqn. (5) for an elaborated psym: 
k-1 
Psum = 0+ |1- oy Cra (pYr-1)" (1 - Py] (1 = 0) 
ac (12) 
Similarly, y can be elaborated by rewriting Eqn. (6) as: 


¥ = (¢-1)"p*psumCi TF (pyr-1)* 40 - pY-1)* =~ (13) 


Both psum and y can be further simplified by the following 
corollary. 


Coro.iary 5.2. Whenk = O(1) andt > 00, psym andy could be 


simplified aspsum © _2Vk Vi=l 


In an ER network, all nodes share the same degree expectation: 
deg(t) = (t — 1)p and thus possess same ‘status’. By sequentially 
introducing new nodes and bringing their ‘influence’ power into 
the network, we can rewrite Eqn. (4) as: 


tY; = (t — 1)Y;-1+ 








1- 2NE O+ aNE 1+ z =e 
V2n(t — 1) V2n(t — 1) V2n(k —1) Vt—k 


(14) 


The following corollary concerns Yn. 


ene andy ~ (t-1)p? Psum 


CoroLiary 5.3. Asn — oo andn >> k we can reform the 
equation with several constants represented as capitalized letters: 














my = (An~Bin(n— 1) +€)0+D (in Zn) - (15) 
2Vk 

aq pan Peo pe Ok ep pk Vk=1p p95 _ 
Here, A eT Tk Te ant Foe C= ae in n2 
Vet) — 28 in MER (k -1)) - 2 ng AP Ink - 0, 


pu2vk 4 Me - S Vek = 
p-2Mk + UE, EE In(k 1)+ 42k in (SE =) (k 1). 


The above proof shows that once nppp = O(21nn), the ‘inner’ 
diffusion is powerful enough to spread the influence to the whole 





network. Therefore, we have nY;, = O( 2inny To make our expres- 
sion neat, we only keep the dominating term in the expression. 
Hence, we can draw the conclusion that in ER networks under 
traditional k-complex contagion model, the network wide influence 
in >= = 


np 





spreading occurs if 6 satisfies 0 = © 





5.2.3 Proof of Theorem 4.5(1). As every node in an evolving PA 
network emits m edges upon arrival, with m = O(1), the ‘reacti- 
vation’ = little chance to take place. In view of this, we neglect 
Pr[JI;|Nz,. Iu] in PA networks. 

Our hoot is built upon two corollaries. Corollary 5.4 explicates 
the probability for a node to get influenced and the proportion 
of influenced nodes required for a network-wide influence scale. 
Corollary 5.4 analyzes the cases where the ‘inner’ diffusion can 
produce a network-wide cascading. Corollary 5.5 shows the po- 
tential ‘influence’ power of the newly coming node (ie. ‘initial’ 
diffusion). Particularly, we show that the scale that comparable to 
whole network size will get influenced with probability 1 — @(4) 
when the initial seed proportion 0 meets certain condition. 


CoroLiary 5.4. Suppose po > 0 and let d(u) be the degree of node 
u. Then we specify Eqn (7) 
| (16) 


n 


Dia (Ni) i <k 


i=1 








Pata =11=1m-+(0=po) [t= 














where 
n Diy ai) 
i\. — = 
. 3 1 (i) <1 <k| = Pr|d(u) (rn + —a m| <kl. 
F ify = 0 (%# 
urthermore, if pp = © (2). the cascading scope that is comparable 


to the whole network size will occur with high probability. 


pasate 5.5. Letk and m be two constants with m > k. Let 
In =* Vk 
L= — +5 3. + ibd ™ M= 1 N= 2 
2k? Van(k-1)’ V2n(m—1) 
ae ane PA model psum, y and Eqn. (4) take respectively the 
following forms: 


be constants. 











ae O+ 2vk 
V2n(m—1)}  V2n(m—1) 
1 ko 3 min =k Vi 
2n(k — 1) 2m + 2 + ok = Xt —1) Psum, 


Psum = ( ae 





Vt 
2(t - 1) 





2mtY; = 2m(2t — 1)¥;-1 + (n +M (: - ) (1-N)0+N). 


(17) 
If @ satisfies O (eatsca) the network wide influence spreading 
will happen with high probability. 


5.3 Cascading in Generalized k-complex 
contagion Model 


By introducing dynamics into node’s ‘influence threshold’, the k- 
complex contagion model is able to capture individual differences 
and thus better reflect reality. We consider in this paper two types 
of threshold dynamics, with the ‘influence threshold’ exhibiting two 
distinct distributions. Again, we only present the most important 
proof techniques involved. Details are available in the full version 
[31]. 


5.3.1 Proof of Lemma 4.4(2). We first sketch the proof when 
‘influence threshold’ k follows uniform distribution in an evolv- 
ing ER network. Thanks ie he uniform distribution, we have 


Yr, ~9 PrtRu = kK] = DM 


Coro tary 5.6. Embedding a Pr{Ry = k] into Eqn. (8) and 
(7), we derive the following ae by virtue of chernoff bound. 


Pri |Nu, Iu] = ppo (18) 
n-1 1 
Pry = 1] = po + (1 -po)}1- > Baye pro +(n’ —n)ppo < | 
k=1 
(19) 


In conclusion, if n’ > Om ral when n — ov, the scale compara- 


ble to the ‘impasse’ part will i influenced. 


5.3.2. Proof of Theorem 4.1(2). In the evolving ER model, by as- 
suming n’ = n in the above proof (ie. there is no ‘reactivation’ part 
in the network), we need pp = (1) proportion of influenced nodes 
in the network to complete the cascading during the ‘inner’ diffu- 
sion. Similar to previous demonstrations, we focus on determining 
the influence spread by ‘initial’ diffusion in the network. 

By specifying Eqn. (5) and (6), the following corollary gives the 
expressions of Psum andy . 


Coro.iary 5.7. In an ER network where k is uniformly distributed, 


Psum = (1 — pYr-1)0 + pYr-1, Y =PPsum- (20) 


Substituting the above recursive equations into Eqn. (4), we 
obtain: 
t-—1+p t-—1l+p 
t-1 P t-1 





t-1 
tY, =(t-14 p60) Yr. + * Pe. (21 


CoroLiary 5.8. When network size is t, the proportion of the 
‘influenced’ nodes in the network (i.e.Y¢) is 
6 0 


y, pare 
+= pa) * 1-p(1—8@) (22) 





O(1-p) 
1-O(p)" 


Ast=n7w,@= 





5.3.3 Proof of Lemma 4.4(3). This Lemma proves the reactiva- 
tion phenomenon under ER model when k obeys the Poisson Dis- 
tribution. By showing the power of nodes in the ‘reactivation’ part 
and revealing the influence spreading via ‘inner’ diffusion among 
nodes in the ‘impasse’ part, it find the required number of nodes in 
the ‘reactivation’ part (i.e. n’ — n). The analysis is similar to that of 
Lemma 4.4 (2), and we only present here a sketch of the proof. 

Integrating Poisson distribution into Eqn. (7) and (8), we have: 

ava 


Pr[J;|Ni, Iu] = Vand (23) 





n-1 k 

ya 2va 

Pri{f, = 1) =1- —Pr | nppo + (n’ — n) | _ (24) 
- Due k} WV2a(n — 1) 
The probability that the entire network is influenced is: 
Pr[First n number of nodes get influenced] = 
=| eA iat 

oe n(1 — po)e “e** (25) 





exp (~ppo +(n’ —n) ata 5) h 


where fy is a constant parameter of chernoff bound. In conclusion, 
te 2Inn + A(e"! — 1) is satisfied, it is 


almost certain that all nodes in the ‘impasse’ part are influenced. 


when nppot+(n’ — n) 


5.3.4 Proof of Theorem 4.1(3). Similarly to the former proof, we 
let n’ = n (i.e. omit the ‘reactivation’ part), and find that we need 
roughly ©(2 Inn) seeds to complete the cascading during the ‘inner’ 
diffusion. In the following we reveal the power of ‘initial’ diffusion 
under Poisson distribution in an evolving ER network and find the 
required 0. 


Coro iary 5.9. Embedding the Poisson distribution into Eqn. (7) 
and (8), we have: 


=J1—- ava O+ ava and 
pon Woe} Neng 1) 


The recursive equation of Y; is: 


P_) ae 2va ae 2va 
von V2n(t - 1) v2n(t - 1) 


CorROLLaRy 5.10. When the network size tends ton — oo and 
VAn 
n 


n-1 


al proportion of 


yx (t = Dp’ Psum 
Von 





tY; = (t = 1)Y¥p-1 + (1 + 








po = o (2), we need to choose 0 = O 


nodes in the network to be the seeds. 


5.3.5 Proof of Theorem 4.5(2) and (3). The reasoning differences 
under two generalized k-complex contagion models in an evolving 
PA network mainly hide in the mathematical process. Here we just 
present the key results during the proof by combining Theorem 4.5 
(2) and (3) together with the help of Lemmas 5.11 and 5.12. 


LEMMA 5.11. Suppose n — oo. When ‘influence threshold’ k 
follows the Uniform Distribution in an evolving PA network, po = 
° (4). Let H = Wm oo (n >> H). For each newly coming node: 


2(vm — 1 - 1) 
mv2n 


sum 


Psum = (1- Yr-1)0 + Yr-1 andy = 


Table 1: Theoretical Fitting Curve 








Situation Probability Fitting Result 
2In n—D( In(¥22(n-1))-E) 
GeEER& GY €é Fixed O=a- 2k a = 4.903 


An-Bln (n—1)+C 
GeEER& Je Uniform O=a- fh 
ainn—(14 2) 2 in YF (n-1) 


a= 0.96, b = 2.10 











GeEER&QY€ Poisson | 0=a A a= 13.16 
n—24 In(n-1)+1 
i = qi — — = 
Ge PA& Je Fixed O=a Trin) a= 0.43 
J i =a: 1 = 
Ge PA& Ze Uniform O=a Vue? a= 0.014 
Ge PA& Ye Poisson O=a- 1 a = 0.6031 


Vi-In(t—1) 


1 A, B,C, D, E are all constants. The specific details of them could be seen in Corollary 5.3. 





Then we can find that the seed proportion required for a network-wide 


influence is 0 = © (—oae 


LEMMA 5.12. When ‘influence threshold’ k follows the Poisson 
Distribution in an evolving PA network, po = © (2). For each 


newly coming node, we have: 





avi Je avi 


ene [.- Vai) ea 





1 [va 3 vi 
y= + + M —- ——— Psum 
Van \2m  2VA-1 ava — 1(t — 1) 
pre 2 ae ml 
where M is a constant with expression )iv_, € “4 —zg""—. Then 
we can get the result that the proportion required for a network-wide 


influence is 0 = © (as ; 
6 EXPERIMENTS 


We perform simulations under both synthetic and real network 
to perform the principle of information cascading in evolving net- 
works and to verify our estimation of the seed quantity order under 
each setting. The accuracy of theoretic estimations is illustrated by 
position of simulated curves and actual curves. 


6.1 Experiments under Synthetic Networks 


For synthetic networks, we generate graphs with size ranging from 
10000 to 500000 by using separately the evolving ER model G(n, p) 
and the evolving PA model. We adopt p = 0.03 for all evolving ER 
construction and we set m = 10 for evolving PA model PAm, n(V, E). 
And we take k = 5 in traditional k-complex model, while A = 10 in 
Poisson Distribution. In order to get closer to reality and to increase 
the extensiveness of our work, we consider in addition the Gaussian 
Distribution as the third possible distribution of k. Analysis under 
the Gaussian model is too difficult, therefore we only studied the 
case empirically and observe the information cascading during the 
experiments. We set yp = 8,6 = 2 and discretized the Gaussian 
Distribution by p(x) = F(x + 0.5) — F(x — 0.5) and normalized the 
sum of the probability to 1. 

Now we would like to present one by one the information cas- 
cading model setting and the fitting equations in ER networks and 
PA networks. In each fitting equation, @ is the actual proportion 
of seeds in the whole network and a is the fitting parameter of 
the theoretic formula. The results are presented in Table [1]. The 
parameter ‘a’(and ‘b’) in each situation is generated by curve-fitting 
the theoretical estimation with the first four nodes. 


Figure 2 (a) shows the results where the ‘influence threshold’ is a 
constant. Our experiment shows that once determined the parame- 
ter ‘a’ by first four nodes, theoretic estimations match perfectly the 
actual seed numbers under both network topologies, which shows 
that our result is a critical bound of seeds once we choose them 
randomly. Alos, we find that less seeds are needed in larger ER 
networks. This seemingly counter-intuitive result can be justified 
by the fact that late birds emit much more edges than early ones 
in ER networks and greater connectivity is intuitively prone to 
information cascading. 

Figure 2 (b) manifests the accuracy of our theoretic estimations 
in ER networks where the ‘influence threshold’ k has certain dy- 
namics. The overlapping horizontal lines are zoomed for a better 
discrimination. Apart from the case where k follows the uniform 
distribution, the seed quantity decreases as the network expands. 
The reason that the required seed number rises accordingly with 
network size under the uniformly distributed k is that there are 
quite a few ‘unbending’ nodes with high ‘influence thresholds’, 
hampering the information cascading. 

Figure 2 (c) illustrates the correctness of our theoretic estima- 
tions in PA networks where the ‘influence threshold’ k has certain 
dynamics. The overlapping horizontal lines are zoomed for a better 
discrimination. The results in general is quite the opposite of those 
obtained in ER networks. Much less seeds are needed when the 
‘influence threshold’ follows the uniform distribution than the pois- 
son distribution or normal distribution. As A and pare close to m, 
there are more nodes with high ‘influence threshold’s and hence a 
higher average of k under the poisson or normal distribution. The 
results indicates the impact of R,,’s value on the necessary seed 
number. 


6.2 Experiments under Real Networks 


Our real networks are the coauthor datasets of machine learning 
and bioinformatics from 1965 to 2016. As it has been observed 
in [28] that the both networks comply with the PA model, the 
experiments in this part can be seen as the experiments under the 
PA network topology under uniform ‘influence threshold’, with 
k=5. 

Since the annual growth of networks is small, we decide instead 
to classify the data by longer time intervals. We divide the datasets 
by six time stamps: 1980, 1990, 2000, 2005, 2010 and 2016. Due to 
the accelerating growth speed of these networks after 2000, we cut 
the time interval by half(from 10 years to 5 years). The machine 
learning network has 1.51 million nodes and the bioinformatics 
network contains altogether 1.82 million nodes. 

In real networks, there is one new node coming at every time slot. 
The nodes that arrive in the same year are added with a random 
sequential order. In cases where there comes a new node which 
does not have any connections to the existing nodes or has less 
than 10 edges connect with former nodes, we will manually add 
the number of their edges up to 10 according to the PA model so 
that the newcomer is not isolated from other nodes. 

Figure 3 shows the results under the machine learning network 
and the bioinformatics network. The simulated curve is drawn with 
the theoretic estimation nO = a-n- aaa 


are characterised by a power-law degree distribution. The real data 
points are scattered evenly above and below the simulated curve, 


as both networks 
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Figure 2: Relation between network size and seed number under (a) traditional k-complex model with k=5, (b) simulated ER graphs and (c) 
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Figure 3: Relation between network size and seed number 
in real networks 

confirming the PA network topology of the two real networks. 
Once again, our theoretic evaluation has excellent performances, 
with fitting parameter a being 0.4111 in machine learning network 
and 0.7982 in bioinformatics network. 

7 CONCLUSION 

This paper initiates the study of cascading under different evolving 
models with generalized influence thresholds. The theoretical esti- 
mations and the experiments both illustrate the counter-intuitive 
fact that the proportion of the required number of initial seeds 
decreases as the network evolves in the ER model, which firmly 
demonstrates the information diffusion capability of late birds. And 
the result in PA model shows we could use some later nodes with 
low degree to take the place of early birds. The extension of the 
problem could aim at discussing the results on a more general net- 
work combining the ER and PA model together, or consider the 
situation when influence diffusion in the network is not infinity. 
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